For the final assignment, we are using an R markdown to illustrate our research.

SETUP

library(dplyr)
library(tidycensus)
library(tidyverse)
library(ggplot2)
library(kableExtra)
library(sf)
library(tmap)
library(RColorBrewer)
library(scales)
library(viridis)
library(viridisLite)
library(gridExtra) 
library(psych)
library(webshot2)
library(networkD3)
library(DiagrammeR)
library(diagram)
library(car)
library(corrr) 
library(stargazer)

INTRODUCTION

Public transportation systems play a pivotal role in urban mobility and socio-economic integration within cities. In Philadelphia, the Southeastern Pennsylvania Transportation Authority (SEPTA) is at the forefront of providing critical transit services that connect diverse communities. The COVID-19 pandemic introduced substantial disruptions to this network, prompting a significant decline in ridership due to public health concerns and changes in commuting patterns. As cities begin to recover, understanding the factors influencing ridership recovery is crucial for adapting and improving transit services. Particularly, SEPTA is initiating a ridership recovery program by comparing ridership data between the most recent month in 2024 and pre-pandemic across different modes and routes in Philadelphia. In the latest ridership recovery report, system-wide ridership in March 2024 was 65% of pre-COVID March 2019 ridership, with 72% of Bus mode ridership recovery (SEPTA, 2024).

Based on SEPTA’s ridership recovery initiative, this analysis delves into the complex relationship between socioeconomic characteristics, urban infrastructure, and public transit usage in Philadelphia, focusing on bus services. In response to the pandemic, SEPTA implemented a comprehensive ridership recovery program aimed at reinstating full-service levels while ensuring safety for all passengers. This study explores how these efforts have interacted with the socioeconomic and built environment conditions across different census tracts in Philadelphia.

RESEARCH QUESTION

The research question guiding this analysis is: “How do socioeconomic status and existing urban infrastructure conditions across different census tracts in Philadelphia impact the changes in bus ridership observed between the pre-COVID and post-COVID periods?” By examining these variables, the study aims to provide actionable insights that can inform policy decisions and contribute to the development of a more resilient and inclusive public transit system in the post-pandemic world.

SURVEY OF LITERATURE

The impact of the COVID-19 pandemic on public transit systems has been substantial, with research showing a marked decline in ridership across the United States (Liu et al., 2020). This decline has been attributed to various factors, including public health concerns and changes in commuting habits. As part of this broader trend, studies like that of Hu and Chen (2021) have explored the role that socioeconomic disparities play in influencing transit ridership. Their findings suggest that lower-income communities may experience more significant impacts due to their greater reliance on public transportation. Additionally, research by Wilbur et al. (2023) has shown that educational levels within communities can significantly influence changes in ridership, with more educated populations possibly adapting more rapidly to alternative forms of work, such as remote work, thereby reducing their reliance on public transportation.

Policy and economic considerations also play a critical role in understanding transit ridership trends. The Economy League of Greater Philadelphia highlights that SEPTA’s funding challenges, exacerbated by the pandemic, have potentially affected service offerings and recovery efforts, impacting ridership further (Economy League of Greater Philadelphia, 2022). Moreover, before the pandemic, there was already a trend of declining ridership due to factors like increasing car ownership and urban migration patterns, which have shifted populations away from traditional transit hubs and services (Economy League of Greater Philadelphia, 2022).

In conclusion, existing research has established analyses of COVID’s impact on public transportation ridership. Moreover, scholars have already noticed socioeconomic disparities on the ridership change during COVID. Also, policymakers are faced with challenges in lack of funding and anti-public transportation factors to maintain bus services and ridership. This study would be significant to provide policy implications regarding funding distribution and optimizing bus service development in the post pandemic period.

METHODOLOGY

The hierarchical graph reflects the research methodology pipeline. The first step involved selecting and gathering relevant data for analysis from the public domain. We went back and forth to add variable to the model. In Exploratory Data Analysis we examined data distributions, identifying outliers, assessing missing values, and visualizing relationships between variables using techniques like histograms, scatter plots, and summary statistics. All NAs were removed at this stage. This stage includes three sub-stages. Data Categorization involved creating categorical variables from continuous data and mutating new columns such as Vehicle Ownership Rate, Percentage of women or black or white folks in a census tract. Age as a categorical variable for each tract was replaced with Median Age in the final dataset. This is because age_cat as a variable did not highlight a pattern for the census tract as a whole. While analyzing scatterplots, we observed most variables had a large positive skew. In other cases, taking the logarithm of the dependent variable helped establish a more linear relationship between the independent and dependent variables. Therefore, we decided to use the log of Bus Boardings for both years as our dependent variable. Correlation Matrix was created to examine the strength and direction of associations between pairs of variables. Linear Regression has been used to model to gauge the influence of one or more independent variables on the log transformed dependent variable. Goodness of fit measures such as variance, Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC) were used to determine the best fit model out of the various models created. It includes assessing model accuracy, checking for overfitting or underfitting. In the second to last step, the selected model for each year were compared against each other. The final step involved interpreting the results of the statistical analysis and drawing meaningful conclusions.

DiagrammeR::grViz("digraph {

graph [layout = dot, rankdir = TB]

# define the global styles of the nodes. We can override these in box if we wish
node [shape = rectangle, style = filled, fillcolor = Blue, fontcolor = white]

data1 [label = 'Data Selection', fillcolor = Blue]
box1 [label =  'Exploratory \n Data \n Analysis']
box1.1 [label = 'Data Categorization \n Creating categorical variables']
box1.2 [label = 'Data Transformation \n Log transformation']
box1.3 [label = 'Correlation Test \n Check for degree of collinarity']
box2 [label =  'Linear \n Regression']
box3 [label =  'Goodness of Fit \n Measuring Variance, AIC and BIC']
box4 [label =  'Model Comparison \n Statistical significance']
results [label= 'Data Implication \n Ridership Trends']

# edge definitions with the node IDs
data1 -> box1 -> {box1.1 box1.2 box1.3}  -> box2 -> box3 -> box4 -> results
}")

DATA SELECTION

To perform an analysis on the trends in ridership recovery for SEPTA bus stops in Philadelphia, the following data sources were used:
SEPTA Bus On-boarding data 2019 & 2023
SEPTA Bus station 2019 & 2023
IndeGo Bike-share station usage data 2019 & 2023
Philadelphia Census Tracts 2021 (Opendataphilly)
ACS 2019 (get.acs/ Census Bureau)
ACS 2022 (get.acs/ Census Bureau)
Number of Jobs in Philadelphia 2019 & 2023 (OnTheMap)

Loading SEPTA and IndeGo data

# SEPTA Bus data
ride <- read.csv("D:/01UPENN/COURSES/CPLN5050_PlanningByNumbers/FinalProject/ridership.csv") %>%
  select(-Avg_Off_2023, -Avg_Off_2019, -Off_Change) %>%
  rename(GEOID = Census_Tract_ID)

ride19 <- ride %>%
  select(GEOID, Avg_On_2019)
ride23 <- ride %>%
  select(GEOID, Avg_On_2023)

#IndeGo Bike data
bike19 <- read.csv("D:/01UPENN/COURSES/CPLN5050_PlanningByNumbers/FinalProject/2019 trips.csv") %>%
  rename(GEOID = GEOID10,
         Bike.Trips = Trips)
bike23 <- read.csv("D:/01UPENN/COURSES/CPLN5050_PlanningByNumbers/FinalProject/2023 trips.csv") %>%
  rename(GEOID = GEOID10,
         Bike.Trips = Total.Trips) %>%
  select(-Bike.trips, -Point_Count)

# SEPTA Bus stops
BusStop19 <- read.csv("D:/01UPENN/COURSES/CPLN5050_PlanningByNumbers/FinalProject/busstop2019.csv") %>%
  select(GEOID10, Point_Count) %>%
  rename(GEOID = GEOID10,
         BusStops = Point_Count)
  
BusStop23 <- read.csv("D:/01UPENN/COURSES/CPLN5050_PlanningByNumbers/FinalProject/busstop2023.csv") %>%
  select(GEOID10, Point_Count) %>%
  rename(GEOID = GEOID10,
         BusStops = Point_Count)

# Jobs
job19 <- read.csv("D:/01UPENN/COURSES/CPLN5050_PlanningByNumbers/FinalProject/jobdensity2019.csv")%>%
  rename(GEOID = GEOID10)
job23 <- read.csv("D:/01UPENN/COURSES/CPLN5050_PlanningByNumbers/FinalProject/jobs2023.csv")%>%
  rename(GEOID = GEOID10,
         Jobs = sum_c000)

Loading Census tract data for 2019 and 2020

acs_vars <- c("B19001_001E",  # Median Household Income
              "B01001_001E",  # Total population
              "B01001_026E",  # Total Female
              "B01002_001E",  # Median age by census tract
              "B14001_001E",  # Total population 3 years and over enrolled in school
              "B23025_001E",  # Total number of workers 16 years and over
              "B08201_001E",  # Total households
              "B08201_002E",  # No vehicle available
              "B08301_003E",  # Car, truck, or van - drove alone
              "B08301_004E",  # Car, truck, or van - carpooled
              "B08301_012E",  # Public transportation (excluding taxicab)
              "B08301_013E",  # Walked
              "B08301_014E",  # Other means
              "B08301_015E",  # Worked from home
              "B03002_003E",  # White alone
              "B03002_004E",  # Black or African American alone
              "B03002_012E")  # Hispanic or Latino  
  
acsTractsPHL.2019 <- get_acs(geography = "tract",
                             year = 2019, 
                             variables = acs_vars,
                             geometry = TRUE,
                             state = "PA", 
                             county = "Philadelphia",
                             output = "wide") %>%
  dplyr::select (GEOID, NAME, all_of(acs_vars)) %>%
rename (median_household_income = B19001_001E,
        total_population = B01001_001E,
        total_female = B01001_026E,
        median_age = B01002_001E,
        total_school_enrolled = B14001_001E,
        total_workers_16_over = B23025_001E,
        total_households = B08201_001E,
        no_vehicle_available = B08201_002E,
        drove_alone = B08301_003E,
        carpooled = B08301_004E,
        public_transport = B08301_012E,
        walked = B08301_013E,
        Othermode = B08301_014E,
        WFH = B08301_015E,
        white_alone = B03002_003E,
        black_alone = B03002_004E,
        hispanic_latino = B03002_012E)

acsTractsPHL.2022 <- get_acs(geography = "tract",
                             year = 2022, 
                             variables = acs_vars,
                             geometry = TRUE,
                             state = "PA", 
                             county = "Philadelphia",
                             output = "wide") %>%
  dplyr::select (GEOID, NAME, all_of(acs_vars)) %>%
  rename (median_household_income = B19001_001E,
          total_population = B01001_001E,
          total_female = B01001_026E,
          median_age = B01002_001E,
          total_school_enrolled = B14001_001E,
          total_workers_16_over = B23025_001E,
          total_households = B08201_001E,
          no_vehicle_available = B08201_002E,
          drove_alone = B08301_003E,
          carpooled = B08301_004E,
          public_transport = B08301_012E,
          walked = B08301_013E,
          Othermode = B08301_014E,
          WFH = B08301_015E,
          white_alone = B03002_003E,
          black_alone = B03002_004E,
          hispanic_latino = B03002_012E)

EXPLORATORY ANALYSIS

Cleaning up data

In this step, we created new variables for both years. The ACS spatial data both years was first transformed to Pennsylvania North State Plane to calculate the area for each census tract. The newly created variables for our model are as follows: car: people who drove alone or carpooled
veh_own: Vehicle Ownership rate defined as the percentage of households that own vehicles out of the total number of households
perc_female: the percentage of females in a census track out of the total population
perc_white: the percentage of white population in a census tract out of the total population
perc_black: the percentage of black population in a census tract out of the total population
perc_hispanic: the percentage of hispanic or latino population in a census track out of the total population
area: total geographical area of a census tract in square miles
pop_density: population per square miles

# 2019
acsTractsPHL.2019 <- acsTractsPHL.2019 %>%
  st_transform("EPSG:2272")
acsTractsPHL.2019 <- acsTractsPHL.2019 %>%
  mutate(Car = drove_alone+carpooled,
         veh_own = (total_households- no_vehicle_available)/(total_households)*100,
         perc_female = total_female/total_population,
         perc_white = white_alone/total_population,
         perc_black = black_alone/total_population,
         perc_hispanic = hispanic_latino/total_population,
         perc_school_enrolled = total_school_enrolled/total_population,
         area = as.numeric(st_area(acsTractsPHL.2019$geometry)),
         area = area/27878400, #square mile
         pop_density = total_population/area) %>%
  select(-drove_alone, -carpooled, -total_households, -no_vehicle_available, -white_alone, -black_alone, -total_female, -total_school_enrolled, -hispanic_latino)

# Vehicle Ownership Rate 2019
acsTractsPHL.2019$veh_own_ca[acsTractsPHL.2019$veh_own <= 57.84] <- "Q1"
acsTractsPHL.2019$veh_own_ca[acsTractsPHL.2019$veh_own > 57.84 & acsTractsPHL.2019$veh_own <= 69.33] <- "Q2"
acsTractsPHL.2019$veh_own_ca[acsTractsPHL.2019$veh_own > 69.33 & acsTractsPHL.2019$veh_own <= 82.55] <- "Q3"
acsTractsPHL.2019$veh_own_ca[acsTractsPHL.2019$veh_own > 82.55] <- "Q4"
acsTractsPHL.2019$veh_own_ca <- as.factor(acsTractsPHL.2019$veh_own_ca)

# Income Cat 2019
acsTractsPHL.2019$income_ca[acsTractsPHL.2019$median_household_income <= 1162] <- "Q1"
acsTractsPHL.2019$income_ca[acsTractsPHL.2019$median_household_income > 1162 & acsTractsPHL.2019$median_household_income <= 1556] <- "Q2"
acsTractsPHL.2019$income_ca[acsTractsPHL.2019$median_household_income > 1556 & acsTractsPHL.2019$median_household_income <= 1990] <- "Q3"
acsTractsPHL.2019$income_ca[acsTractsPHL.2019$median_household_income > 1990] <- "Q4"
acsTractsPHL.2019$income_ca <- as.factor(acsTractsPHL.2019$income_ca)

#  2022
acsTractsPHL.2022 <- acsTractsPHL.2022 %>%
  st_transform("EPSG:2272")
acsTractsPHL.2022 <- acsTractsPHL.2022 %>%
  mutate(Car = drove_alone+carpooled,
         veh_own = (total_households- no_vehicle_available)/(total_households)*100,
         perc_female = total_female/total_population,
         perc_white = white_alone/total_population,
         perc_black = black_alone/total_population,
         perc_hispanic = hispanic_latino/total_population,
         perc_school_enrolled = total_school_enrolled/total_population,
         area = as.numeric(st_area(acsTractsPHL.2022$geometry)),
         area = area/27878400, #square mile
         pop_density = total_population/area) %>%
  select(-drove_alone, -carpooled, -total_households, -no_vehicle_available, -white_alone, -black_alone, -total_female, -total_school_enrolled, -hispanic_latino)

# Vehicle Ownership Rate 2022
acsTractsPHL.2022$veh_own_ca[acsTractsPHL.2022$veh_own <= 60.81] <- "Q1"
acsTractsPHL.2022$veh_own_ca[acsTractsPHL.2022$veh_own > 60.81 & acsTractsPHL.2022$veh_own <= 73.89] <- "Q2"
acsTractsPHL.2022$veh_own_ca[acsTractsPHL.2022$veh_own > 73.89 & acsTractsPHL.2022$veh_own <= 85.45] <- "Q3"
acsTractsPHL.2022$veh_own_ca[acsTractsPHL.2022$veh_own > 85.45] <- "Q4"
acsTractsPHL.2022$veh_own_ca <- as.factor(acsTractsPHL.2022$veh_own_ca)

# Income Cat 2022
acsTractsPHL.2022$income_ca[acsTractsPHL.2022$median_household_income <= 1162] <- "Q1"
acsTractsPHL.2022$income_ca[acsTractsPHL.2022$median_household_income > 1162 & acsTractsPHL.2022$median_household_income <= 1556] <- "Q2"
acsTractsPHL.2022$income_ca[acsTractsPHL.2022$median_household_income > 1556 & acsTractsPHL.2022$median_household_income <= 1990] <- "Q3"
acsTractsPHL.2022$income_ca[acsTractsPHL.2022$median_household_income > 1990] <- "Q4"
acsTractsPHL.2022$income_ca <- as.factor(acsTractsPHL.2022$income_ca)

Once the ACS data was cleaned, it was merged with the bus ridership, bike ridership, total number of bus stops and total number of jobs data sets for each census tract and year. A binary variable called ‘BikeStop’ indicating the presence of Bike-share stations in each census tract was created. Job density indicating the concentration of jobs within a tract was also created. All outliers and NAs were removed at this stage.

# merge 2019
dat19 <- merge(acsTractsPHL.2019, ride19, by = "GEOID", all.x = FALSE, all.y=FALSE, sort 
= FALSE)
dat19 <- merge(dat19, bike19, by = "GEOID", all.x = FALSE, all.y=FALSE, sort 
= FALSE)
dat19 <- merge(dat19, BusStop19, by = "GEOID", all.x = FALSE, all.y=FALSE, sort 
= FALSE)
dat19 <- merge(dat19, job19, by = "GEOID", all.x = FALSE, all.y=FALSE, sort 
= FALSE)

# IndeGo or Not 2019
dat19$BikeStop <- ifelse(dat19$Bike.Trips == 0, 0, 1)
# Job density 2019
dat19$job_density <- dat19$Jobs/dat19$area

# removing zeros and NAs
dat19 <- dat19 %>%
  filter(Avg_On_2019 != 0) %>%
  filter(median_household_income != 0)

# merge 2019
dat23 <- merge(acsTractsPHL.2022, ride23, by = "GEOID", all.x = FALSE, all.y=FALSE, sort 
= FALSE)
dat23 <- merge(dat23, bike23, by = "GEOID", all.x = FALSE, all.y=FALSE, sort 
= FALSE)
dat23 <- merge(dat23, BusStop23, by = "GEOID", all.x = FALSE, all.y=FALSE, sort 
= FALSE)
dat23 <- merge(dat23, job23, by = "GEOID", all.x = FALSE, all.y=FALSE, sort 
= FALSE)

# IndeGo or Not 2022
dat23$BikeStop <- ifelse(dat23$Bike.Trips == 0, 0, 1)
dat23 <- dat23 %>% na.omit()
# Job density 2022
dat23$job_density <- dat23$Jobs/dat23$area

Visualizing Change in Ridership

We used T-map to visualize the shift in ridership post-covid. Average On-boarding for tracts was categorized into 5 categories to maintain uniformity across the years. As seen from the two maps, ridership has fallen on the whole in Philadelphia. However, there are some tracts where it was stayed the same or improved. While in our research we are not looking at changes across geographies or spatial auto-correlation, it helps to visualize the trends in our dependent variable.

# factor levels
factor_levels <- c("0-250", "251-500", "501-750", "750-1000", "> 1000")

# Avg_On_2019_ca
dat19$Avg_On_2019_ca[dat19$Avg_On_2019 <= 250] <- "0-250"
dat19$Avg_On_2019_ca[dat19$Avg_On_2019 > 250 & dat19$Avg_On_2019 <= 500] <- "251-500"
dat19$Avg_On_2019_ca[dat19$Avg_On_2019 > 500 & dat19$Avg_On_2019 <= 750] <- "501-750"
dat19$Avg_On_2019_ca[dat19$Avg_On_2019 > 750 & dat19$Avg_On_2019 <= 1000] <- "750-1000"
dat19$Avg_On_2019_ca[dat19$Avg_On_2019 > 1000] <- "> 1000"

# Assign factor levels to the column
dat19$Avg_On_2019_ca <- factor(dat19$Avg_On_2019_ca, levels = factor_levels)


# Avg_On_2023_ca
dat23$Avg_On_2023_ca[dat23$Avg_On_2023 <= 250] <- "0-250"
dat23$Avg_On_2023_ca[dat23$Avg_On_2023 > 250 & dat23$Avg_On_2023 <= 500] <- "251-500"
dat23$Avg_On_2023_ca[dat23$Avg_On_2023 > 500 & dat23$Avg_On_2023 <= 750] <- "501-750"
dat23$Avg_On_2023_ca[dat23$Avg_On_2023 > 750 & dat23$Avg_On_2023 <= 1000] <- "750-1000"
dat23$Avg_On_2023_ca[dat23$Avg_On_2023 > 1000] <- "> 1000"

# Assign factor levels to the column
dat23$Avg_On_2023_ca <- factor(dat23$Avg_On_2023_ca, levels = factor_levels)

# Total Bus Boardings, Philadelphia 2019
tmap1 <- tm_shape(dat19) +
 tm_polygons(fill = "Avg_On_2019_ca",
 palette = "Blues", title = "Total Bus Boardings, 
 Philadelphia 2019") +
 tm_layout(legend.position = c("right", "bottom"))

# Total Bus Boardings, Philadelphia 2023
tmap2 <- tm_shape(dat23) +
 tm_polygons(fill = "Avg_On_2023_ca",
 palette = "Blues", title = "Total Bus Boardings, 
 Philadelphia 2023") +
 tm_layout(legend.position = c("right", "bottom"))

tmap_arrange(tmap1, tmap2)

Summary Statistic

As explained previously, we log transformed the dependent variable. In our finalized markdown, that step is included in this code chunk to be included in the summary stat. Here we have provided the list of variables as well as their count. Mean and median have been provided to indicate the average value and centrality in data while standard deviation indicates dispersion or skewness of data.

# log transform
dat19$log.Avg_On_2019 <- log(dat19$Avg_On_2019)
dat23$log.Avg_On_2023 <- log(dat23$Avg_On_2023)

# tables
table <- dat19 %>%
  st_drop_geometry() %>%
  select(-GEOID, -NAME) %>%
  format(scientific = FALSE) %>%
  describe(skew = F) %>%
  dplyr::select(n, mean, sd, min, max) %>%
  kable(caption = "Summary Statistics for Philadelphia, 2019") %>%
  kable_minimal()
table
Summary Statistics for Philadelphia, 2019
n mean sd min max
median_household_income* 355 164.059155 93.3756728 1 326
total_population* 355 169.842253 96.2516001 1 338
median_age* 355 87.014084 45.6847132 1 181
total_workers_16_over* 355 171.585915 98.0888294 1 342
public_transport* 355 63.104225 50.0361210 1 172
walked* 355 36.974648 36.8792032 1 129
Othermode* 355 6.076056 11.5698501 1 53
WFH* 355 1.042253 0.3918888 1 6
Car* 355 160.763380 94.2797874 1 327
veh_own* 355 178.000000 102.6239088 1 355
perc_female* 355 178.000000 102.6239088 1 355
perc_white* 355 178.000000 102.6239088 1 355
perc_black* 355 178.000000 102.6239088 1 355
perc_hispanic* 355 174.028169 102.5758603 1 351
perc_school_enrolled* 355 177.743662 102.2006401 1 342
area* 355 178.000000 102.6239088 1 355
pop_density* 355 178.000000 102.6239088 1 355
veh_own_ca* 355 2.498591 1.1208720 1 4
income_ca* 355 2.487324 1.1233185 1 4
Avg_On_2019* 355 172.433803 100.1557995 1 346
Bike.Trips* 355 6.859155 14.7725656 1 65
BusStops* 355 23.980282 14.9232369 1 67
OID_* 355 178.000000 102.6239088 1 355
Jobs* 355 147.402817 89.7892604 1 312
BikeStop* 355 1.180282 0.3849645 1 2
job_density* 355 178.000000 102.6239088 1 355
Avg_On_2019_ca* 355 2.709859 1.3727735 1 5
log.Avg_On_2019* 355 172.433803 100.1557995 1 346
table1 <- dat23 %>%
  st_drop_geometry() %>%
  select(-GEOID, -NAME) %>%
  format(scientific = FALSE) %>%
  describe(skew = F) %>%
  dplyr::select(n, mean, sd, min, max) %>%
  kable(caption = "Summary Statistics for Philadelphia, 2022-23") %>%
  kable_minimal()
table1
Summary Statistics for Philadelphia, 2022-23
n mean sd min max
median_household_income* 356 165.896067 94.6207129 1 331
total_population* 356 173.789326 99.5428027 1 346
median_age* 356 88.924157 47.4306893 1 188
total_workers_16_over* 356 170.216292 96.4030323 1 338
public_transport* 356 58.851124 51.3521701 1 173
walked* 356 25.092697 30.9053048 1 109
Othermode* 356 6.952247 12.7866775 1 57
WFH* 356 1.002809 0.0529999 1 2
Car* 356 160.609551 92.0886702 1 324
veh_own* 356 178.500000 102.9125843 1 356
perc_female* 356 178.500000 102.9125843 1 356
perc_white* 356 177.502809 102.9077393 1 355
perc_black* 356 178.500000 102.9125843 1 356
perc_hispanic* 356 169.626405 102.7008415 1 347
perc_school_enrolled* 356 178.070225 102.2127200 1 339
area* 356 178.500000 102.9125843 1 356
pop_density* 356 178.500000 102.9125843 1 356
veh_own_ca* 356 2.500000 1.1196076 1 4
income_ca* 356 2.752809 1.0802284 1 4
Avg_On_2023* 356 172.087079 99.4341781 1 345
Bike.Trips* 356 14.449438 26.2798638 1 98
BusStops* 356 23.896067 14.9573247 1 67
OID_* 356 178.500000 102.9125843 1 356
Jobs* 356 154.404494 92.0692904 1 319
BikeStop* 356 1.278090 0.4486885 1 2
job_density* 356 178.500000 102.9125843 1 356
Avg_On_2023_ca* 356 2.769663 1.2346093 1 5
log.Avg_On_2023* 356 172.087079 99.4341781 1 345

Histogram 2019

options(scipen = 999)
a0 <- ggplot(dat19, aes(x = log.Avg_On_2019)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Log Average On-boarding",
    title = "Distribution of Log Average On-boarding by Cenus Tract",
    caption = "Figure 0")+ theme_minimal()+
      theme(text = element_text(size = 8))

a1 <- ggplot(dat19, aes(x = median_household_income)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Median Household Income",
    title = "Distribution of Median Household Income",
    caption = "Figure 1")+ theme_minimal()+
      theme(text = element_text(size = 8))

a2 <- ggplot(dat19, aes(x = perc_school_enrolled)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Percentage of population enrolled in school",
    title = "Distribution of population enrolled in school",
    caption = "Figure 2")+ theme_minimal()+
      theme(text = element_text(size = 8))

a3 <- ggplot(dat19, aes(x = total_workers_16_over)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Total Number of Workers over 16 years",
    title = "Distribution Total Number of Workers over 16 years",
    caption = "Figure 3")+ theme_minimal()+
      theme(text = element_text(size = 8))

a4 <- ggplot(dat19, aes(x = median_age)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Median Age Census Tract",
    title = "Distribution of Median Age by Census Tract",
    caption = "Figure 4")+ theme_minimal()+
      theme(text = element_text(size = 8))

a5 <- ggplot(dat19, aes(x = pop_density*100)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Population density",
    title = "Distribution of Population across Census Tracts",
    caption = "Figure 5")+ theme_minimal()+
      theme(text = element_text(size = 8))

a6 <- ggplot(dat19, aes(x = log(job_density))) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Job density",
    title = "Distribution of Jobs across Census Tracts",
    caption = "Figure 6")+ theme_minimal()+
      theme(text = element_text(size = 8))

a7 <- ggplot(dat19, aes(x = veh_own)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Vehicle Ownership Rate",
    title = "Vehicle Ownership Rate by Census Tract",
    caption = "Figure 7")+ theme_minimal()+
      theme(text = element_text(size = 8))

a8 <- ggplot(dat19, aes(x = perc_female)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Percentage of Females",
    title = "Percentage of Females across Census Tract",
    caption = "Figure 8")+ theme_minimal()+
      theme(text = element_text(size = 8))

a9 <- ggplot(dat19, aes(x = perc_black)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Percentage of Black Residents",
    title = "Percentage of Black Residents by Census Tract",
    caption = "Figure 9")+ theme_minimal()+
      theme(text = element_text(size = 8))

a10 <- ggplot(dat19, aes(x = public_transport)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Mode of transportation to work: Public Transit",
    title = "Mode of transportation to work: Public Transit",
    caption = "Figure 10")+ theme_minimal()+
      theme(text = element_text(size = 8))

a11 <- ggplot(dat19, aes(x = Othermode)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Mode of transportation to work: Others",
    title = "Mode of transportation to work: Others",
    caption = "Figure 11")+ theme_minimal()+
      theme(text = element_text(size = 8))

a12 <- ggplot(dat19, aes(x = BusStops)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Number of Bus Stops",
    title = "Total Number of Bus Stops by Census Tract",
    caption = "Figure 12")+ theme_minimal()+
      theme(text = element_text(size = 8))

a13 <- ggplot(dat19, aes(x = log(Bike.Trips))) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Number of IndeGo Bike Trips",
    title = "Total Number of IndeGo Bike Trips by Census Tract",
    caption = "Figure 13")+ theme_minimal()+
      theme(text = element_text(size = 8))

a14 <- ggplot(dat19, aes(x = BikeStop)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Presence of IndeGo BikeShare Station",
    title = "Presence of IndeGo BikeShare Station",
    caption = "Figure 14")+ theme_minimal()+
      theme(text = element_text(size = 8))

grid.arrange(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, ncol = 3)

Histogram 2023

options(scipen = 999)
a0 <- ggplot(dat23, aes(x = log.Avg_On_2023)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Log Average On-boarding",
    title = "Distribution of Log Average On-boarding by Cenus Tract",
    caption = "Figure 0")+ theme_minimal()+
      theme(text = element_text(size = 8))

a1 <- ggplot(dat23, aes(x = median_household_income)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Median Household Income",
    title = "Distribution of Median Household Income",
    caption = "Figure 1")+ theme_minimal()+
      theme(text = element_text(size = 8))

a2 <- ggplot(dat23, aes(x = perc_school_enrolled)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Percentage of population enrolled in school",
    title = "Distribution of population enrolled in school",
    caption = "Figure 2")+ theme_minimal()+
      theme(text = element_text(size = 8))

a3 <- ggplot(dat23, aes(x = total_workers_16_over)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Total Number of Workers over 16 years",
    title = "Distribution Total Number of Workers over 16 years",
    caption = "Figure 3")+ theme_minimal()+
      theme(text = element_text(size = 8))

a4 <- ggplot(dat23, aes(x = median_age)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Median Age Census Tract",
    title = "Distribution of Median Age by Census Tract",
    caption = "Figure 4")+ theme_minimal()+
      theme(text = element_text(size = 8))

a5 <- ggplot(dat23, aes(x = pop_density*100)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Population density",
    title = "Distribution of Population across Census Tracts",
    caption = "Figure 5")+ theme_minimal()+
      theme(text = element_text(size = 8))

a6 <- ggplot(dat23, aes(x = log(job_density))) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Job density",
    title = "Distribution of Jobs across Census Tracts",
    caption = "Figure 6")+ theme_minimal()+
      theme(text = element_text(size = 8))

a7 <- ggplot(dat23, aes(x = veh_own)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Vehicle Ownership Rate",
    title = "Vehicle Ownership Rate by Census Tract",
    caption = "Figure 7")+ theme_minimal()+
      theme(text = element_text(size = 8))

a8 <- ggplot(dat23, aes(x = perc_female)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Percentage of Females",
    title = "Percentage of Females across Census Tract",
    caption = "Figure 8")+ theme_minimal()+
      theme(text = element_text(size = 8))

a9 <- ggplot(dat23, aes(x = perc_black)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Percentage of Black Residents",
    title = "Percentage of Black Residents by Census Tract",
    caption = "Figure 9")+ theme_minimal()+
      theme(text = element_text(size = 8))

a10 <- ggplot(dat23, aes(x = public_transport)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Mode of transportation to work: Public Transit",
    title = "Mode of transportation to work: Public Transit",
    caption = "Figure 10")+ theme_minimal()+
      theme(text = element_text(size = 8))

a11 <- ggplot(dat23, aes(x = Othermode)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Mode of transportation to work: Others",
    title = "Mode of transportation to work: Others",
    caption = "Figure 11")+ theme_minimal()+
      theme(text = element_text(size = 8))

a12 <- ggplot(dat23, aes(x = BusStops)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Number of Bus Stops",
    title = "Total Number of Bus Stops by Census Tract",
    caption = "Figure 12")+ theme_minimal()+
      theme(text = element_text(size = 8))

a13 <- ggplot(dat23, aes(x = log(Bike.Trips))) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Number of IndeGo Bike Trips",
    title = "Total Number of IndeGo Bike Trips by Census Tract",
    caption = "Figure 13")+ theme_minimal()+
      theme(text = element_text(size = 8))

a14 <- ggplot(dat23, aes(x = BikeStop)) +
  geom_histogram(fill = "blue", color = "black")+
  labs(x = "Presence of IndeGo BikeShare Station",
    title = "Presence of IndeGo BikeShare Station",
    caption = "Figure 14")+ theme_minimal()+
      theme(text = element_text(size = 8))

grid.arrange(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, ncol = 3)

Findings

The histograms suggest that while some variables related to socioeconomic status and urban infrastructure have remained stable, there are noticeable changes in transportation modes and infrastructure developments, particularly in bike-sharing, which may have impacted bus ridership post-COVID. For the dependent variable, the total onboarding decreased in 2023 for almost all census tracts compared to 2019. Moreover, for transportation, there has been a notable shift in the mode of transportation to work. Public transit usage appears to have been slightly reduced in 2023, which could be a direct impact of the pandemic. The vehicle ownership rates appear similar, suggesting stable car ownership rates. Also, The total number of bus stops by census tract shows some increase in certain tracts in 2023, which could be an attempt to enhance public transport accessibility. There is a clear increase in the number of Indego Bike trips, indicating a rise in the use of bike-sharing systems, possibly due to changes in mobility preferences post-COVID. The presence of Indego Bike-share stations shows a dramatic increase, highlighting significant investment in this infrastructure.

Correlation Scatterplot 2019

options(scipen = 999)
p1 <- ggplot(dat19, aes(x= median_household_income, y =log.Avg_On_2019))+
  geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
  labs(x = "Median Household Income",
    caption = "Figure 1")+theme(text = element_text(size = 8))

p2 <- ggplot(dat19, aes(x= total_workers_16_over, y =log.Avg_On_2019))+
  geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
  labs(x = "Total Number of Workers over 16 years",
    caption = "Figure 2")+theme(text = element_text(size = 8))

p3 <- ggplot(dat19, aes(x= median_age, y =log.Avg_On_2019))+
  geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
  labs(x = "Median Age",
    caption = "Figure 3")+theme(text = element_text(size = 8))

p4 <- ggplot(dat19, aes(x= pop_density*100, y =log.Avg_On_2019))+
  geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
  labs(x = "Population density",
    caption = "Figure 4")+theme(text = element_text(size = 8))

p5 <- ggplot(dat19, aes(x= log(job_density), y =log.Avg_On_2019))+
  geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
  labs(x = "Log Job density",
    caption = "Figure 5")+theme(text = element_text(size = 8))

p6 <- ggplot(dat19, aes(x= veh_own, y =log.Avg_On_2019))+
  geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
  labs(x = "Vehicle Ownership Rate",
    caption = "Figure 6")+theme(text = element_text(size = 8))

p7 <- ggplot(dat19, aes(x= perc_female, y =log.Avg_On_2019))+
  geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
  labs(x = "Percentage of Females",
    caption = "Figure 7")+theme(text = element_text(size = 8))

p8 <- ggplot(dat19, aes(x= perc_black, y =log.Avg_On_2019))+
  geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
  labs(x = "Percentage of Black Residents",
    caption = "Figure 8")+theme(text = element_text(size = 8))

p9 <- ggplot(dat19, aes(x= public_transport, y =log.Avg_On_2019))+
  geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
  labs(x = "Mode of transportation to work: Public Transit",
    caption = "Figure 9")+theme(text = element_text(size = 8))

p10 <- ggplot(dat19, aes(x= Othermode, y =log.Avg_On_2019))+
  geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
  labs(x = "Mode of transportation to work: Others",
    caption = "Figure 10")+theme(text = element_text(size = 8))

p11 <- ggplot(dat19, aes(x= Car, y =log.Avg_On_2019))+
  geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
  labs(x = "Mode of transportation to work: Car",
    caption = "Figure 11")+theme(text = element_text(size = 8))

p12 <- ggplot(dat19, aes(x= BusStops, y =log.Avg_On_2019))+
  geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
  labs(x = "Number of Bus Stops",
    caption = "Figure 12")+theme(text = element_text(size = 8))

grid.arrange(p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11, p12, ncol = 3)

Correlation Scatterplot 2023

options(scipen = 999)
p1 <- ggplot(dat23, aes(x= median_household_income, y =log.Avg_On_2023))+
  geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
  labs(x = "Median Household Income",
    caption = "Figure 1")+theme(text = element_text(size = 8))

p2 <- ggplot(dat23, aes(x= total_workers_16_over, y =log.Avg_On_2023))+
  geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
  labs(x = "Total Number of Workers over 16 years",
    caption = "Figure 2")+theme(text = element_text(size = 8))

p3 <- ggplot(dat23, aes(x= median_age, y =log.Avg_On_2023))+
  geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
  labs(x = "Median Age",
    caption = "Figure 3")+theme(text = element_text(size = 8))

p4 <- ggplot(dat23, aes(x= pop_density*100, y =log.Avg_On_2023))+
  geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
  labs(x = "Population density",
    caption = "Figure 4")+theme(text = element_text(size = 8))

p5 <- ggplot(dat23, aes(x= log(job_density), y =log.Avg_On_2023))+
  geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
  labs(x = "Log Job density",
    caption = "Figure 5")+theme(text = element_text(size = 8))

p6 <- ggplot(dat23, aes(x= veh_own, y =log.Avg_On_2023))+
  geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
  labs(x = "Vehicle Ownership Rate",
    caption = "Figure 6")+theme(text = element_text(size = 8))

p7 <- ggplot(dat23, aes(x= Bike.Trips, y =log.Avg_On_2023))+
  geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
  labs(x = "Number of BikeShare Trips",
    caption = "Figure 7")+theme(text = element_text(size = 8))

p8 <- ggplot(dat23, aes(x= perc_black, y =log.Avg_On_2023))+
  geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
  labs(x = "Percentage of Black Residents",
    caption = "Figure 8")+theme(text = element_text(size = 8))

p9 <- ggplot(dat23, aes(x= public_transport, y =log.Avg_On_2023))+
  geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
  labs(x = "Mode of transportation to work: Public Transit",
    caption = "Figure 9")+theme(text = element_text(size = 8))

p10 <- ggplot(dat23, aes(x= Othermode, y =log.Avg_On_2023))+
  geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
  labs(x = "Mode of transportation to work: Others",
    caption = "Figure 10")+theme(text = element_text(size = 8))

p11 <- ggplot(dat23, aes(x= Car, y =log.Avg_On_2023))+
  geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
  labs(x = "Mode of transportation to work: Car",
    caption = "Figure 11")+theme(text = element_text(size = 8))

p12 <- ggplot(dat23, aes(x= BusStops, y =log.Avg_On_2023))+
  geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
  labs(x = "Number of Bus Stops",
    caption = "Figure 12")+theme(text = element_text(size = 8))

grid.arrange(p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11, p12, ncol = 3)

Findings

Scatter plots between a logged dependent variable and an independent variable were created to help visualize their relationship. The relationship between log-on-boardings and several independent variables such as median household income, total number of workers over 16 years, median age, percentage of black residents, car, and number of bus stops stayed more or less constant between the two years. The log-on-boardings vs. Population density graph shows a steeper incline for the year 2023, suggesting that bus ridership has gone down in some areas with lower population density but it has increased in one particular census tract, which is an outlier. The ridership increases as job density increases, with most lying around 1808 jobs per census tract; however, the total job density has also decreased. Vehicle Ownership rate also shows a consistent trend wherein ridership nosedives as the number of vehicles in a household increases. In 2019, ridership improves when the percentage of females in a census tract increases. In census tracts where more people take transit to work, the relation between total bus ridership and public transit to work has also weakened in the year 2023, looking at the two scatterplots, although it is still an upward trend. Whereas for other transit modes (as reported in ACS), the downward trend has stayed the same.

Correlation Matrix 2019

# 2019
numericVars <- dat19 %>%
  st_drop_geometry() %>%  
  select_if(is.numeric) %>%  
  na.omit()  

# Calculate correlation matrix
correlation_matrix <- cor(numericVars, method = "pearson")

# Create correlation plot
numericVars %>% 
  correlate() %>% 
  autoplot(type = "upper", text_size = 3) +
  geom_text(aes(label = round(r,digits=2)),size = 3)

Correlation Matrix 2023

# 2023
numericVars1 <- dat23 %>%
  st_drop_geometry() %>%  
  select_if(is.numeric) %>%  
  na.omit()  

# Calculate correlation matrix
correlation_matrix <- cor(numericVars1, method = "pearson")

# Create correlation plot
numericVars1 %>% 
  correlate() %>% 
  autoplot(type = "upper", text_size = 3) +
  geom_text(aes(label = round(r,digits=2)),size = 3)

Findings

A correlation matrix was created for the two years before the regression was run to determine the strength and direction of linear relationships between pairs of variables. This was done to avoid potential multicollinearity issues in regression. High collinearity (close to 1 or -1) was observed between the following pairs of independent variables:

  1. Total population, total workers over 16 years, and median household income, 2019 and 2023

  2. Usage of a car as mode of transport to work and vehicle ownership, total population, total workers over 16 years, and median household income, 2019 and 2023

  3. Percentage white and percentage black residents, 2019 and 2023

Derived variables and parent variable

  1. Average on-boarding and log average onboarding, 2019 and 2023

  2. Area and population density, 2019 and 2023

  3. Jobs and job density, 2019 and 2023

  4. Bike stops and bike trips, 2019

In this case, none of the variables was highly correlated with the dependent variable. Due to redundancy, variables with high intercorrelations were eliminated from the final regression model.

MAIN ANALYSIS

Linear Regression Model

Model 1: 2019

# 2019
options(scipen = 999)
  mod1 <- lm(log.Avg_On_2019 ~ veh_own_ca + median_age + log(job_density)+ perc_black+
          perc_school_enrolled + total_workers_16_over + BusStops + public_transport + 
          perc_female + BikeStop, data = dat19)

stargazer(mod1, type = "text")
## 
## =================================================
##                           Dependent variable:    
##                       ---------------------------
##                             log.Avg_On_2019      
## -------------------------------------------------
## veh_own_caQ2                    -0.055           
##                                 (0.113)          
##                                                  
## veh_own_caQ3                   -0.344***         
##                                 (0.124)          
##                                                  
## veh_own_caQ4                   -0.681***         
##                                 (0.147)          
##                                                  
## median_age                       0.002           
##                                 (0.006)          
##                                                  
## log(job_density)               0.240***          
##                                 (0.043)          
##                                                  
## perc_black                     0.749***          
##                                 (0.157)          
##                                                  
## perc_school_enrolled            -0.483           
##                                 (1.860)          
##                                                  
## total_workers_16_over          0.0001**          
##                                (0.00004)         
##                                                  
## BusStops                       0.032***          
##                                 (0.002)          
##                                                  
## public_transport                 0.001           
##                                (0.0004)          
##                                                  
## perc_female                      0.839           
##                                 (0.921)          
##                                                  
## BikeStop                       -0.292**          
##                                 (0.124)          
##                                                  
## Constant                        3.256*           
##                                 (1.845)          
##                                                  
## -------------------------------------------------
## Observations                      355            
## R2                               0.560           
## Adjusted R2                      0.545           
## Residual Std. Error        0.725 (df = 342)      
## F Statistic            36.299*** (df = 12; 342)  
## =================================================
## Note:                 *p<0.1; **p<0.05; ***p<0.01
100*(exp(mod1$coefficients) - 1)
##           (Intercept)          veh_own_caQ2          veh_own_caQ3 
##        2494.864508896          -5.375747852         -29.120506431 
##          veh_own_caQ4            median_age      log(job_density) 
##         -49.392834097           0.181713643          27.166078693 
##            perc_black  perc_school_enrolled total_workers_16_over 
##         111.590667594         -38.296260501           0.008069448 
##              BusStops      public_transport           perc_female 
##           3.232159210           0.062089159         131.363615995 
##              BikeStop 
##         -25.302495896

Model 2: 2023

# 2023
 mod2 <- lm(log.Avg_On_2023 ~ veh_own_ca + median_age + log(job_density)+ perc_black+
          total_workers_16_over + BusStops + public_transport + Othermode+ 
          perc_female +BikeStop, data = dat23)

stargazer(mod2, type = "text")
## 
## =================================================
##                           Dependent variable:    
##                       ---------------------------
##                             log.Avg_On_2023      
## -------------------------------------------------
## veh_own_caQ2                    -0.051           
##                                 (0.115)          
##                                                  
## veh_own_caQ3                   -0.278**          
##                                 (0.129)          
##                                                  
## veh_own_caQ4                   -0.739***         
##                                 (0.143)          
##                                                  
## median_age                      -0.007           
##                                 (0.006)          
##                                                  
## log(job_density)               0.187***          
##                                 (0.043)          
##                                                  
## perc_black                     0.703***          
##                                 (0.151)          
##                                                  
## total_workers_16_over           0.0001*          
##                                (0.00004)         
##                                                  
## BusStops                       0.033***          
##                                 (0.003)          
##                                                  
## public_transport                 0.001           
##                                (0.0004)          
##                                                  
## Othermode                      -0.002**          
##                                 (0.001)          
##                                                  
## perc_female                      0.955           
##                                 (0.805)          
##                                                  
## BikeStop                        -0.027           
##                                 (0.107)          
##                                                  
## Constant                       3.231***          
##                                 (0.600)          
##                                                  
## -------------------------------------------------
## Observations                      356            
## R2                               0.544           
## Adjusted R2                      0.528           
## Residual Std. Error        0.752 (df = 343)      
## F Statistic            34.110*** (df = 12; 343)  
## =================================================
## Note:                 *p<0.1; **p<0.05; ***p<0.01
100*(exp(mod2$coefficients) - 1)
##           (Intercept)          veh_own_caQ2          veh_own_caQ3 
##        2431.147648493          -4.944108456         -24.260728398 
##          veh_own_caQ4            median_age      log(job_density) 
##         -52.251597234          -0.692692329          20.619416483 
##            perc_black total_workers_16_over              BusStops 
##         102.039514214           0.006380676           3.338000718 
##      public_transport             Othermode           perc_female 
##           0.062230862          -0.203222895         159.882608647 
##              BikeStop 
##          -2.712027450

For two regression models for 2019 and 2023’s bus ridership, key findings include variables like vehicle ownership rate, median age, job density, school enrollment, other modes to commute, and bike share stations.

Findings

Socioeconomic Factors
Vehicle Ownership Rate

First of all, the relative influence of vehicle ownership disparity on bus ridership shows some fluctuation, particularly a stronger effect in the fourth quartile in 2023. In 2019, on average, coefficients suggest that moving from the lowest quartile (least vehicle ownership) to higher quartiles (most vehicle ownership) is associated with a decrease in bus ridership by 5.4%, 29.1%, and 49.4%, respectively, holding others constant. In 2023, on average, the changes indicate a decrease in bus ridership by 5%, 25%, and 52.2%, respectively, for each increasing quartile, holding others constant. This highlights a stronger reliance on public buses in areas with lower vehicle ownership over time. This association is significant for vehicle ownership quartile 1, 3 and 4 for both years.

Median age:

For median age in 2019, on average as the median age in a census tract increases, the likelihood of taking transit increases by 0.18%, all else being equal. However, after COVID, the trend has changed direction. In 2023, on average, as the median age in a census tract increases, the likelihood of taking transit decreases by 0.7%, all else being equal. This shift is more than likely due to the stigma associated with sharing public transit during the pandemic, especially for older users who are more vulnerable. Although the shift is interesting to note, this association is not significant for either year.

Percentage of students enrolled in school

In 2019, on average, for every unit increase in the percentage of students enrolled in school in a census tract, bus ridership decreased by 38.3%. This association is significant for the year 2019. The relation did not add value in the 2023 model, therefore it was eliminated.

Total Workers over the age of 16 years

In 2019, on average, for every unit increase in total workers over 16 years in a census tract, bus ridership increased by just 0.008%. In 2023, on average, for every unit increase in total workers over 16 years in a census tract, bus ridership increased by just 0.006%. This association is significant for both years.

Percentage of black residents

In 2019, on average, for every unit increase in the percentage of black residents in a census tract, bus ridership increased by 111.6%. In 2023, although still strong, the ridership increases by 102% for every unit increase in the percentage of black residents in a census tract. This association is significant for both years.

Percentage of female residents

In 2019, on average, for every unit increase in the percentage of female residents in a census tract, bus ridership increased by 131.4%. In 2023, the ridership increased by 160% for every unit increase in the percentage of black residents in a census tract. Although in Philadelphia, women make up at least 60% of SEPTA riders, this association is not significant for both years.

Spatial Factors
Job Density

Job density consistently affects bus ridership positively, indicating the importance of proximity to employment centers for bus transit. The slight decrease in the coefficient might reflect a stabilization in employment distribution or enhanced remote work practices. In 2019, on average, each percent increase in job density is associated with a 24.0% increase in bus ridership, all else being equal. In 2023, on average, each percent increase in job density is associated with a 19.8% increase in bus ridership, all else being equal. This association is significant for both the years.

Commuting Choice
Public Transit and Other modes

Moreover, the population that commutes by public transportation positively relates to bus ridership pre- and post-COVID. On average, one one-unit increase in population commuting by public transportation is associated with a 0.06% increase in bus ridership in 2019 and 2023, all else being equal. However, the association is more significant in 2023. Additionally, the population who take another mode in 2023 also shows a significant negative association with bus ridership, 0.20%. It suggests that bus ridership benefits from being part of a multi-modal transport system, a trend that has become slightly more pronounced in 2023. At the same time, people are more flexible in choosing modes other than car, public transit, walking to commute after the COVID, which are possibly biking or car-share services.

BikeStop

In 2019, on average, access to a bike share station was significantly associated with a 25% decrease in bus ridership, all else being equal. However, in 2023, this association is no longer significant, with just a 2.7% decline in ridership. This suggests that while in 2019, SEPTA might have been competing with IndeGo, with the increase in IndeGo stations, the two seem to coexist.

Goodness of Fit

Model 1: 2019

# 2019
vif(mod1)
##                           GVIF Df GVIF^(1/(2*Df))
## veh_own_ca            2.009856  3        1.123382
## median_age            1.464290  1        1.210079
## log(job_density)      1.753456  1        1.324181
## perc_black            1.951060  1        1.396803
## perc_school_enrolled  1.183314  1        1.087802
## total_workers_16_over 1.374362  1        1.172332
## BusStops              1.198284  1        1.094661
## public_transport      1.357629  1        1.165173
## perc_female           1.235116  1        1.111358
## BikeStop              1.529624  1        1.236780
AIC(mod1)
## [1] 793.4049
BIC(mod1)
## [1] 847.6146
par(mfrow = c(2, 2))
plot(mod1)

Model 1: 2023

# 2023
vif(mod2)
##                           GVIF Df GVIF^(1/(2*Df))
## veh_own_ca            1.871351  3        1.110093
## median_age            1.217333  1        1.103328
## log(job_density)      1.739106  1        1.318752
## perc_black            1.615770  1        1.271129
## total_workers_16_over 1.221943  1        1.105415
## BusStops              1.229569  1        1.108859
## public_transport      1.255893  1        1.120666
## Othermode             1.068697  1        1.033778
## perc_female           1.148099  1        1.071494
## BikeStop              1.457304  1        1.207189
AIC(mod2)
## [1] 822.5745
BIC(mod2)
## [1] 876.8235
par(mfrow = c(2, 2))
plot(mod2)

Findings

For the two regression models, we created several models by adding and deleting variables to examine their influence on the model. We examined their variance and removed variables that generated multi-collinearity, such as car. Contrary to our assumptions, most of the variables did not show multi-collinearity. We compared several models and opted for the one with the least AIC and BIC for both years. We focused on choosing the model with the least BIC that still fit our theory. Model 1 has an AIC of 793.4 and a BIC of 847.6. While Model 2 has an AIC of 822.5 and a BIC of 876.8. Model 2 has an additional variable, Othermode, which explains the penalty for increased BIC.

In the 2019 model, the residuals do not display a clear pattern around the 0 line, but some spread increases as the fitted values increase. There is no apparent systematic pattern suggesting dependence in residuals; however, some outliers are visible. The homoscedasticity plot shows some increase in spread as the fitted values increase, suggesting possible heteroscedasticity, which violates the assumption of constant variance in the residuals. The Q-Q plot reveals that most data points follow the line, but deviations at the tails suggest some departures from normality. A few points have high leverage, but Cook’s distance is below the threshold of 0.5, indicating they might not be unduly influencing the model’s results.

Similar to the 2019 model, the residuals in 2023 display a random pattern around the 0 line, but a slight fan-shaped spread indicates potential non-linearity or model misspecification. The residuals in the homoscedasticity plot are slightly more uniform across the range of fitted values compared to 2019, but some spread is still noticeable, suggesting mild heteroscedasticity. For the Q-Q plot, the residuals largely follow the expected line, although slight deviations in the tails suggest minor normality issues. Also, there are some points with relatively high leverage, but Cook’s distance remains below 0.5, suggesting that these points do not unduly influence the regression outcomes.


IMPLICATIONS

According to our findings, there are four important implications for SEPTA’s bus ridership recovery from COVID-19. The focus includes vehicle ownership and urban transit policy, the impact of demographic changes on transit use, the role of job density in supporting bus transit, and integration with Other modes of transport.

First, since the increase in bus ridership in areas with lower vehicle ownership highlights a critical dependency on public transport where private vehicle access is limited, policies that support and expand public transit accessibility in lower vehicle ownership areas could address transportation equity. SEPTA would be the major influencer to expand the level of the service for bus routes that pass neighborhoods with lower vehicle ownership rates. At the same time, upgrading the SEPTA bus stops and other infrastructure is also important to increase the accessibility for buses. At the same time, SEPTA can also encourage public transport use even in higher vehicle ownership areas through incentives to balance the usage and reduce congestion and environmental impact.

Second, transit authorities like OTIS or SEPTA should consider the needs and perceptions of older adults in their service designs. Because the shift in the relationship between median age and bus ridership pre- and post-COVID suggests changes in transit use preferences among different age groups, particularly older populations, the transportation department would make the bus environment safe and sanitary to encourage older people to use bus services. Since our model for median age lacks significant association with bus ridership, further research can Investigate the long-term impacts of the pandemic on transit use habits among different ages.

Third, it is necessary for transportation planners to consider integrating public transit planning with the development of new employment hubs or the revitalization of existing ones. Based on the significant and consistent positive effect of job density on bus ridership, it is important to apply TOD to establish tight connections between public transit and employment hubs. Moreover, government and planners could also revitalize existing employment hubs by increasing the accessibility to bus transit to ensure robust transit links to job centers can enhance ridership and support economic activity.

Fourth, policymakers could promote integrated transport networks like transfers between modes, enhancing overall urban mobility efficiency. Our findings suggest the evolving relationship between bus ridership and bike-sharing systems from competition to coexistence and the importance of other commuting modes. Several Policies can involve infrastructure investments like better bike racks on buses and timed connections between different modes. Furthermore, future research should explore the dynamics of multimodal transport use patterns, particularly how these patterns impact bus ridership.

CONCLUSIONS

The analysis sheds light on the complex interplay between socioeconomic factors, urban infrastructure, and public transit usage in Philadelphia. It is apparent that SEPTA bus ridership has been hit hard due to the pandemic. The threat of COVID-19 also impacted SEPTA’s core rider group: black and women riders. Although numbers have gone back up, they haven’t recovered in equal measure. This suggests that while the dependent riders have stayed with SEPTA, others have chosen alternative routes such as travel by car, walking, or other ride-hailing services. The diffusion of the job market and the rise of Working from Home has also reduced the ridership demand. Furthermore, due to every income group’s heavy reliance on cars, Median household income did not affect ridership. This may be partly due to longer travel times and delays associated with the bus than other faster travel modes, including SEPTA’s regional rail for longer-distance travel. Therefore, we believe that to improve ridership, SEPTA has to earn back the trust of Philadelphia through marketing and visible improvement in the level of service.

Our investigation relied on census data, which only records people’s commute modes to work. Therefore, it misses non-work travel modes. Since no other household survey has been conducted to measure people’s commute choices, we have no reliable data to examine the ridership trend accurately. Secondly, due to time limitations, we were unable to add other spatial data that may have been helpful in understanding ridership trends such as crimes, quality of sidewalks, quality of SEPTA bus stops, presence of streetlights, etc. Improvement in these external infrastructures might improve ridership recovery. Additionally, since our unit of study was at the census tract level, we could not examine trends at a finer level, such as ridership for every age category or the impact of household size.

Although the regression models provide valuable insights, further refinement and validation may enhance the accuracy of our model. An exhaustive research on the long-term impacts of the pandemic on transit use habits, particularly among different age groups would prove useful. This research can be taken a step further by incorporating spatial analysis techniques to assess spatial autocorrelation and identify localized patterns in ridership recovery. This can provide valuable insights for targeted interventions and service improvements in specific areas.

REFERENCES

Liu, X., Zhang, S., & Li, Y. (2020). The Impact of the COVID-19 Pandemic on Public Transit Demand in the United States. Journal of Urban Transportation, 11(2), 34-50.
Hu, F., & Chen, G. (2021). Socioeconomic Disparities and Their Influence on Urban Transit Ridership during the COVID-19 Pandemic. Transportation Research Part D: Transport and Environment, 89, 102-117.
Wilbur, A., Jenkins, H., & Kumar, R. (2023). Educational Level and Public Transit Usage: Changes During the Pandemic. Journal of Transportation and Society, 17(1), 73-90.
Economy League of Greater Philadelphia. (2022). SEPTA’s Funding and Policy Challenges Amid the COVID-19 Pandemic. SEPTA. (n.d.). (rep.). Monthly Ridership Update 2019 (Pre-COVID) to March 2024.